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ABSTRACT 


Pattern  recognition  and  classification  systems  have  been  under  de= 
velopment  for  several  years.  This  paper  examines  one  of  these  systems, 
which  has  been  called  an  adaptive  linear  neuron,  to  determine  how  the 
desired  classification  is  achieved  and  how  this  system  might  be  used  in 
the  practical  field  of  character  recognition.  Specifically,  the  follow¬ 
ing  ideas  are  discussed  in  this  paper; 

(1)  The  basic  concepts  of  linear  separability  and  iterative  adap¬ 
tion  by  an  adaptive  linear  neuron  (Adaline),  as  applied  to  the 
pattern  recognition  and  classification  problem. 

(2)  Four  possible  iterative  adaption  schemes  which  may  be  used  to 
train  an  Adaline. 

(3)  Use  of  Multiple  Adalines  (Madaline)  and  two  logic  layers  to 
increase  system  capability. 

(4)  Use  of  Adaline  in  the  practical  fields  of  Speech  Recognition, 
Weather  Forecasting  and  Adaptive  Control  Systems  and  the  possible 
use  of  Madaline  in  the  Character  Recognition  field. 
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1.  Introduction 


Systems  which  can  be  trained  to  classify  complex  digital  and  analog 
patterns  have  been  under  development  for  several  years.  One  such  system 
has  been  proposed  by  B.  Widrow  and  others  at  Stanford  University  co. 

In  this  system,  pattern  recognition  and  classification  are  accomplished 
through  the  use  of  integrative  memory  cells,  each  of  which  consists  of  a 
variable  resistor  whose  value  can  be  made,  by  suitable  "training”,  to 
become  a  useful  function  of  the  experiences  of  the  device  itself.  The 
training  of  these  devices  is  essentially  a  process  of  iterative  adaption. 

In  general,  training  consists  of  a  systematic  adjustment  of  each  of  the 
memory  elements  of  the  device  in  such  a  way  that  the  system  is  forced  to 
produce  a  predesignated  output  to  each  of  a  number  of  specific  inputs* 

After  the  device  has  successfully  adapted  to  a  large  number  of  training 
patterns,  it  has  the  ability  to  classify  inputs  which  are  related  to  the 
training  patterns,  as  well  as  the  training  patterns  themselves. 
Characteristics  of  the  trained  system  are  the  effectively  instantaneous 
classification  of  pattern  inputs  and  the  diffuse  storage  of  information 
throughout  the  memory  elements.  The  basic  system  developed  by  the  Stanford 
group  is  an  adaptive  threshold  device  called  an  adaptive  neuron,  or 
Adaline,  an  acronym  of  adaptive  linear  neuron.  A  discussion  of  the 
functions  performed  by  this  element  will  be  presented  later.  Systems 
containing  one  or  more  of  these  devices  are  being  developed  for  use  in 
speech  recognition,  weather  forecasting  and  electrocardiogram  analysis. 

They  have  also  been  used  as  trainable  controllers  in  adaptive  control 
systems . 

In  this  paper  it  is  planned  to  review  the  progress  made  by  the 
Stanford  group  in  the  field  of  pattern  recognition  and  classification. 
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Specifically,  a  close  examination  will  be  made  of  an  Adaline,  the  basic 
ideas  behind  its  use,  the  procedure  followed  in  training  it  and  the 
results  obtained  using  a  trained  system.  The  investigation  will  be 
prosecuted  experimentally  by  testing  digital  computer  simulated  models. 
In  addition,  a  similar  mathematical  model  will  be  applied  to  the 
practical  problem  of  classifying  the  ten  characters  zero  through  nine 
when  these  are  in  a  hand  printed  form.  Finally,  a  start  will  be  made  on 
the  problem  of  identifying  a  set  of  45  alphanumeric  characters,  also  in 
hand  printed  form. 
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2.  The  Basic  Adaptive  Neuron. 

The  basic  adaptive  neuron,  or  Adaline,  developed  by  the  Stanford 

group  is  shown  in  figure  1.  The  input  p.*s  are  binary s  but  for 

convenience  they  are  required  to  take  the  values  of  +1,  rather  them 

the  more  conventional  +1  and  0.  An  input  pattern  is  defined  to  be  a 

particular  set  of  values  of  the  p^'s.  In  the  Adaline  each  p^  is 

multiplied  by  a  corresponding  (analog)  weight,  w^,  which  can  be  re= 

garded  as  the  current  setting  of  the  i  memory  cell,  and  which  can 

take  both  positive  and  negative  values.  The  weight  w^  is  called  the 

n 

threshold  weight.  Thus  the  analog  output  of  the  summer  is  p.w„  +  w 

C  JL  l  i-1  1  1  0 

and  the  digital  or  quantized  output  is  sgn 


p,w.  +  w->* 


i=l 


'i"i 


Pi 

P2 

P3 


n 
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3.  Linear  Separability. 


For  simplicity,  a  two  input  Adaline  will  be  discussed  first. 
Suppose  that  it  is  desired  to  classify  the  possible  input  patterns  into 
two  classes} 


(a)  one  or  both  p^  positive,  and 

(b)  both  pj  negative. 


In  this  particularly  simple  case  it  is  easy  to  see  that  if  w^  =  w^  = 


W2  “  1,  then  the  corresponding  digital  outputs  will  be  +1  and  -1.  It 
will  be  said  that  this  setting  of  the  weights  has  classified  the  patterns 
into  two  classes,  (a)  with  a  +1  output,  and  (b)  with  a  ~1  output.  In 
more  complicated  cases,  the  values  of  the  weights  would  be  adaptively 
determined  during  training.  However,  is  such  classification  always 
possible?  Suppose  it  is  desired  to  classify  the  input  patterns  into 
the  two  classes: 


(a)  both  p^  positive,  or  both  p^  negative,  and 

(b)  p.'s  with  alternate  signs. 


In  this  case  it  is  not  possible  to  find  a  set  of  which  will  yield  a 
digital  output  of  +1  for  the  first  class  of  input  and  -1  for  the  second. 


.b 


a  . 


a 


Pi  +  P2  +  1  “  0 


a  . 


a 


b 


K 


Fig.  2.  Separation 
Into  Two  Output  Classes 


Fig.  3.  Not  Separable 
By  One  Adaline 
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It  will  then  be  said  that  the  patterns  are  not  "linearly  separable"  into 
these  two  classes  and  that  a  classification  cannot  be  affected  with  a 
single  Adaline. 

The  same  thing  can  be  illustrated  graphically  as  in  figures  2  and  3. 

n 

The  analog  output  must  change  sign  as  the  line  21  P-w4  +  w  *  0  is 

i=l  1  *  0 

crossed,  and  it  can  be  seen  that  the  analog  output  above  the  line  in 
figure  2  is  positive,  and  that  it  is  negative  below  the  line.  Since  a 
line  can  be  drawn  (actually  an  infinite  number  of  lines  can  be  drawn, 
each  with  different  w^)  separating  the  a  and  b  pattern  classes,  it  can  be 
concluded  that  the  patterns  are  linearly  separable  into  these  two 
classes.  On  the  other  hand,  it  is  immediately  obvious  that  the  classi¬ 
fication  of  figure  3  is  not  possible. 

It  is  of  interest  to  continue  the  classification  of  the  possible 
input  patterns  into  different  groups  of  two  classes  until  all  possible 
combinations  have  been  examined.  Using  this  procedure,  both  the  number 
of  linearly  separable  and  the  number  of  not  linearly  separable  examples 
may  be  determined  for  a  two  input  Adaline.  The  digital  outputs  for  the 
two  classes  will  bes 

class  (a)  +1,  and 
class  (b)  -1. 

Table  1  contains  eight  examples  illustrating  this  concept  of  linear 
separability. 

It  can  be  seen  that  examples  1  thru  6  in  the  table  could  be  repeated 
with  the  desired  digital  outputs  reversed,  i.e.,  with  a  °1  output  for 
class  (a)  and  a  +1  output  for  class  (b).  This  procedure  would  double 
the  number  of  linearly  separable  classifications,  and  with  the  addition 
of  the  case  illustrated  as  example  7  and  its  counterpart  (for  all  inputs 
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Table  1.  Separable  and  Not  Separable  Examples  For  Two  Binary  Inputs 

No. 

Desired  Classification  Graphical  One  Set  of  wi  For 

of  Input  Patterns  Representation  Linear  Separation 

i 

*  Cbr  J 

(a)  One  or  both  p.  + 

(b)  Both  pi  -  — 

> 

K  ^  a  Wq  =  1 

— - w  -  1 

^-f+^+Vo  1 

\  a.  w2  =  l 

2 

(a)  Both  p.'s  +  and  both  ^  ' 

A 

p.’s  Pl  +  and  p2  - 

(b)  p1  -  and  p2  +  .0- 

l  a  *  w  =  i 

/  0 

w.  =  1 

a,  a  i*2  =  - 1 

3 

(a)  All  except  in  (b)  &•  N\^ 

(b)  Both  pj’s  +  °— — — 

.0-. 

b*  Wn  ■  1 

^-l“k+|>2=0 

wi  “  ol 
a  „  «2  =  -i 

4 

(a)  Botn  p1 ' s  +,  a 

p.'s  - ,  p,  -  and  p„  + 

i  1  2 

(b)  p1  +  and  p2  -  / 

^  wn  =  1 

/s  0 

■— n  \j  =  *=1 

k.  w2  =  l 

5 

(a)  Both  pj's  +,  p^  ~  and  P2  + 

(b)  Both  Pj's  p^  +  and  p2  - 

.b 

hfe  W  =  0 

1  ^  l  0  V 

jS*-  k-o 

-  ->h  W,  ■  0 

*1  l 

b  .  w2  =  -1 

6 

•  Jk 

(a)  Both  Pj's  +,  p^  and  p2  - 

(b)  Both  pj's  “j.p^  -  and  P2  + 

.b 

"•  w0-0 

*t~  H-o 

- - wt  =  1 

O',  w2  =  0 

7 

(a)  All  inputs  +  * 

. . .  \ 

CL  W  =  1 

0 

^  W  =  \ 

a’.  w2  =  % 

8 

(a)  Both  Pj's  +,  both  p^s  ->,  b  > 

(b)  Pj*s  w/  alternate  signs 

Ct 

• 

L  ^  None 

- ^ 

b 

• 
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to  be  classified  as  (b)),  would  yield  a  total  of  fourteen  examples  which 
are  linearly  separable.  The  interchange  of  the  +1  and  *>1  of  example  8 
would  yield  a  second  example  which  is  not  linearly  separable.  Thus  the 
total  number  of  separable  and  unseparable  classifications  into  two 
classes  is  equal  to  16. 

Suppose  now  that  the  number  of  binary  inputs  to  an  Adaline  is  3 
rather  than  2,  and  that  it  is  desired  to  classify  the  input  patterns 
into  the  two  classes: 


(a) 

p^  =  +1,  p^  “  +1* 

p  “  +1 
j 

(b) 

P1  =  ”1’  p2  “  ±l» 

P  =  +1 
3 

If  the  weights  are  chosen  to  be  w,  =1  and  w„  ■  w  «  w  *>  0,  it  is 

1  0  2  3 

easily  seen  that  the  digital  output  will  be  +1  for  all  the  patterns  in 
class  (a),  and  »1  for  those  in  class  (b).  Thus  separation  has  been 
achieved  by  the  plane  p^  *=  0. 


Table  2.  Numoer  of  Linearly  Separable  Classifications 

for  Different  Numbers  of  Binary  Inputs 

Number  of 

Binary  Inputs  (n) 

Maximum  Possible  Number 

2n 

of  Examples  (2  ) 

Number  of  Known  Linearly 

Separable  Examples 

1 

4 

4 

2 

16 

14 

3 

256 

104 

4 

65,536 

1,882 

5 

4.3xl09 

94,572 

6 

1 4 

1.8x10 

15,028,134 
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In  general,  if  there  are  n  binary  inputs  to  an  Adaline,  the  two 
classes  will  be  separable  if  the  Wj  can  be  chosen  in  such  a  way  that  the 
n-1  dimensional  hyperplane  w^p^  +  w2?2  +  +  wnPn  +  w0  =  ^  separates 

the  two  classes  in  n-space. 

It  was  shown  above  that,  for  an  Adaline  with  two  binary  inputs, 

the  total  number  of  input-output  situations  was  16.  An  extension  of 

2n 

this  reasoning  from  2  to  n  binary  inputs  indicates  that  there  are  2 
possible  examples,  again  including  both  linearly  and  not  linearly 
separable  types.  Table  2,  which  was  extracted  from  C2.1,  lists  this 
figure  together  with  the  numoer  of  known  linearly  separable  examples, 
for  n  from  1  to  6. 
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4.  Threshold  and  Dead  Zone. 


Linearly  separable  input  patterns  cannot  normally  oe  separated 

unless  the  weights  are  carefully  chosen.  The  process  of  changing  the 

weights  until  each  weight  has  the  value  most  likely  to  affect  the  linear 

separation  of  the  input  patterns  is  called  adaption  or  training. 

One  of  the  weights  adjusted  during  adaption  is  w^,  the  threshold 

weight.  First,  in  relation  to  the  binary  or  quantized  output,  we  see 

n 

that  the  output  changes  sign  when  Pj.wi  =  “wo  anc*  t*ie  threshold  can 

i=i 

therefore  be  regarded  simply  as  a  bias  on  the  quantizer,  as  shown  in 
figure  4.  Alternatively,  if  we  consider  the  geometry  of  the  hyperplane, 
it  can  be  seen  that  a  change  of  the  threshold  from  Wq  to  w^'  affects  a 
shift  of  the  hyperplane  to  a  new  location  parallel  to  its  original 
position,  as  illustrated  in  figure  5  for  n  =  2.  In  fact,  the 
perpendicular  distance  of  the  hyperplane  from  the  origin,  d,  will  be 

_ ^ _ .  It  should  also  be  noted  that  the  slope  of  the 

(Wj  +  W2  ----  w  ) 


Quantizer 

Fig.  4.  .it feet  on  Quantizer  of 
Changes  in  .Threshold  Weight 
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hyperplane  is  not  a  function  of  the  threshold,  but  instead  is  determined 
by  the  settings  of  the  other  weights. 


Hyperplane 
for  w„ 


For  the  situation  depicted  in  figure  6,  any  of  the  three  hyperplanes 
determined  by  the  thresholds  Wq,  w^*  and  w^"  will  linearly  separate  the 
input  patterns  into  the  two  desired  output  classes  a  and  b.  However, 
use  of  the  Wq*  or  Wq”  hyperplanes  would  not  provide  the  safety  that 
might  be  needed  in  a  practical  situation.  Some  of  the  factors  that 
might  cause  classification  errors  are  resolution  of  the  quantizer  and 
drift  in  the  circuit  components.  As  a  result,  the  input  to  the  quantizer 
must  always  be  sufficiently  different  from  zero  so  that  a  small  change 
in  the  quantizer  input  after  adaption  will  not  yield  an  incorrect  digital 
output.  In  addition,  the  problem  of  resolution  and  drift  is  magnified 
by  the  fact  that,  as  the  number  of  binary  inputs  increases,  the  tolerance 
required  on  each  weight  becomes  more  critical.  This  is  discussed  in  [3]* 
Now  what  can  be  done  to  decrease  the  effect  of  such  errors  on  the 
Adaline  system?  One  technique  would  be  to  utilize  the  Wq  hyperplane 
as  the  separating  plane,  with  the  w^*  and  w  "  hyperplanes  as  the  outer 
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limits  of  a  buffer  zone  about  the  separating  plane.  The  system  would 
then  be  trained  so  that  all  possible  input  patterns  would  be  classified 
into  classes  above  and  below  the  buffer  zone9  with  a  consequent  reduc= 
tion  of  the  danger  of  producing  an  incorrect  output  from  the  quantizer. 
For  convenience,  each  of  the  spaces  on  either  side  of  the  separating 
plane  will  be  called  a  "dead  zone"  as  labeled  in  figure  6. 
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5.  Multilevel  Adeline. 


It  is  often  necessary  to  separate  the  input  patterns  into  more  than 
two  classes.  In  some  cases  this  can  be  done  effectively  with  a  multi- 
level  Adaline,  which  separates  the  input  patterns  into  classes  according 
to  the  analog  output  (level)  assigned  to  each  patterno  For  this  type  of 
separation  to  be  possible  the  input  patterns  or  sets  of  input  patterns 
must  be  linearly  separable  in  a  specific  fashion.  The  several  assigned 
output  levels  may  be  regarded  as  equivalent  to  a  corresponding  set  of 
threshold  weights.  These  in  turn  define  a  set  of  parallel  hyperplanes 
such  as  shown  in  figure  6  for  Wq,  Wq8  and  w^'*.  Therefore,  the  use  of  the 
multilevel  Adaline  is  restricted  to  the  classification  of  inputs  which 
can  be  linearly  separated  from  each  other  by  a  series  of  parallel  hyper¬ 
planes  . 

A  typical  use  of  a  multilevel  Adaline  would  be  to  classify  the 
written  digits  one  through  nine.  As  a  simple  test  of  this  concept  a 
computer  simulated  Adaline  was  used  to  separate  one  sample  of  each  of 
these  digits.  The  written  digits  were  converted  into  patterns  using  a 
seven  by  seven  input  space.  After  training,  all  nine  patterns  were 
correctly  classified  showing  that  this  particular  set  of  patterns  was 
separable  in  this  manner. 
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6.  Adaline  Adaption. 

The  previous  sections  contained  a  discussion  of  Adaline  fundamentals 
and  the  concept  of  separating  pattern  classes  by  the  appropriate  setting 
of  the  weight  elements.  Training  or  adaption  is  the  process  of  iter~ 
atively  determining  suitable  values  for  the  weight  elements.  This 
section  will  formulate  the  adaption  problem  and  then  consider  four 
possible  adaption  or  training  procedures. 

Adaline  application  in  a  pattern  recognition  problem  involves  the 
following  sequence  of  events.  Firstly,  training  patterns  are  chosen  and 
assigned  digital  outputs,  i.e.  plus  or  minus  one.  Then  an  adaption 
process  is  selected  and  Adaline  is  trained  until  it  either  correctly 
identifies  all  the  patterns,  or  until  the  training  is  terminated  due  to 
a  failure  of  the  training  scheme  to  converge.  If  the  training  was  not 
so  terminated,  the  Adaline  would  now  have  the  capability  to  correctly 
identify  all  the  training  patterns  (thus  proving  the  pattern  classes  to 
be  separable)  and  in  addition,  the  ability  to  recognize  a  number  of 
'•similar"  patterns.  In  fact,  one  of  Adaline's  main  advantages  is  the 
ability  to  identify  patterns  it  has  not  been  trained  on.  Other  methods 
of  pattern  recognition  such  as  "table  look  up"  do  not  generally  have  this 
capability. 

One  question  now  to  be  answered  is,  what  constitutes  a  "similar" 
pattern?  One  measure  of  the  similarity  of  two  patterns  would  be  the 
number  of  pattern  elements  that  would  have  to  be  changed  in  one  pattern 
to  make  the  two  patterns  identical.  Figure  7  displays  three  patterns, 
each  containing  16  pattern  elements.  Pattern  (a)  differs  from  pattern 
(b)  by  one  pattern  element,  while  pattern  (a)  differs  from  pattern  (c) 
by  eight  pattern  elements.  Patterns  (a)  and  (b)  would  be  classified  as 
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"similar"  patterns  while  patterns  (a)  and  (c)  would  be  classified  as 
dissimilar  patterns  if  the  above  criterion  were  used.  In  some  cases, 
however,  a  pattern  might  be  regarded  as  "similar"  to  a  training  pattern 
if  it  were  related  to  it,  for  example,  by  a  simple  rotation  or  transla= 
tion.  In  that  case  patterns  (a)  and  (c)  would  be  regarded  as  similar® 
This  criterion  might  be  used  if  the  Adaline  was  being  specifically 
trained  to  recognize  patterns  even  when  translated  or  rotated  from  their 
normal  configuration® 


X  X  X  o 
.  X  0  . 
O  X  .  0 


X  X  X  o 


.  X 


XXX 
.  X  . 


(a)  (b)  (c) 


Fig.  7.  Similar  and  Dissimilar  Patterns 

Finally,  a  "similar"  pattern  is  often  generated  by  the  contamination 
of  a  training  pattern  by  noise.  For  this  reason  "similar"  patterns  are 
sometimes  referred  to  as  "noisy"  patterns.  The  ability  of  Adaline  to 
recognize  similar  patterns  is  clearly  an  extremely  complex  function  of 
the  total  number  of  pattern  elements  in  the  input  pattern,  the  number 
and  complexity  of  the  training  patterns,  the  value  of  the  weight 
elements,  and  the  degree  of  similarity  of  the  input  pattern  to  a  train¬ 
ing  pattern. 

It  was  shown  earlier  that  the  analog  output  is  imposed  on  a 
quantizer  whose  digital  output  is  either  plus  or  minus  one.  It  is 
usually  desirable  to  endow  the  quantizer  with  a  dead  zone,  so  that  some 
finite  magnitude  of  the  analog  output  is  required  before  the  digital 
output  becomes  non=zero.  During  the  training  or  adaption  phase  the 
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analog  output  of  each  training  pattern  is  made  equal  to  or  greater  than 
an  ’’adaption  dead  zone" .  After  training;,  a  smaller  value  of  dead  zone 
may  be  introduced  which  will  often  enhance  the  probability  of  correctly 
identifying  similar  patterns®  Thus  patterns  will  be  classified  accord~ 
ing  to  the  relationship  of  their  analog  outputs  to  a  ’’recognition  dead 
zone". 

The  adaption  or  training  problem  can  be  formulated  ass 

Givens  A  collection  of  patterns  and  the  desired  digital 
output  for  each  pattern® 

Finds  The  value  of  the  weight  elements  such  that  the 

n 

expression  Wq  +  ^  p,w.  yields  the  appropriate  analog 
i=l  1  1 

output  for  each  input  pattern. 

The  weight  values  are  usually  determined  by  an  iterative  process 
based  on  a  comparison  of  the  actual  analog  output  for  each  training 
pattern  with  the  corresponding  desired  analog  output.  In  other  words9 
an  input  pattern  is  imposed  on  Adaline9  the  analog  output  is  examined 9 
the  weight  elements  are  changed  if  required;,  and  then  another  pattern  is 
imposed.  This  process  is  continued  until  the  analog  output  of  each 
pattern  is  acceptable  without  further  adaption;,  or  until  it  is  determined 
that  the  process  will  not  converge.  The  adaption  process  will  be  defined 
as  having  converged  when  each  training  pattern  generates  an  acceptable 
output.  -It  should  be  noted  that  an  adaption  process  may  not  converge  to 
a  solution  even  though  the  pattern  classes  themselves  are  linearly 
separable.  In  such  a  case  a  different  adaption  technique  would  have  to 
be  employed. 
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7.  Specific  Adaption  Schemes. 


Common  to  each  of  the  adaption  schemes  discussed  below  are  the 
following  rules. 

When  an  analog  output  is  unacceptable 8 

1.  Change  each  of  the  weight  elements  by  the  same  absolute 
magnitude. 

2.  Increase  or  decrease  each  weight  according  to  whether  the 
product  of  the  corresponding  pattern  element  by  the  desired 
change  of  the  analog  output  is  positive  or  negative.  For 
example,  if  the  analog  output  is  to  be  increased  and  p^ 

is  =19  then  w^  would  be  decreased. 

A  description  of  the  four  adaption  schemes  listed  in  (”41  now  follows. 
Minimum  Square  Error  Adaption 

This  process  adjusts  the  weight  elements  in  such  a  manner  that  the 
analog  output  for  each  pattern  is  driven  toward  the  same  absolute  magni= 
tude,  called  the  adaption  dead  zone.  The  error  at  each  step  is  defined 
as  the  difference  between  the  adaption  dead  zone  and  the  actual  analog 
output,  and  the  change  made  to  each  weight  element  is  expressed  by  the 
following  equations 

J  A  ‘  =  Error  x  Proportional  Constant 
1  Total  Number  of  Weights 

If  the  proportional  constant  is  greater  than  zero  and  less  than  two, 
the  process  will  converge  provided  that  the  patterns  are  both  linearly 
separable  and  meet  an  additional  requirement  that  will  be  discussed 
later.  In  the  case  where  the  weight  elements  are  continuously  variables, 
it  has  been  experimentally  determined  (Appendix  IIX)8  that  the  number  of 
iterations  required  to  converge  in  a  typical  situation  approaches  a 
minimum  when  the  proportional  constant  is  approximately  equal  to  one. 
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The  criterion  requiring  identical  magnitudes  for  the  analog  outputs 
for  all  input  patterns  causes  an  extremely  large  number  of  iterations. 
Therefore,  a  tolerance  is  usually  established  around  the  adaption  level, 
defined  as  minimum  square  error  bound,  and  no  changes  are  made  to  the 
weight  elements  if  the  analog  output  falls  within  this  bound. 

This  adaption  scheme  also  suffers  from  the  disadvantage  that  there 
is  no  assurance  of  convergence,  even  if  the  pattern  classes  are  known  to 
be  linearly  separable.  If  the  number  of  input  patterns  is  equal  to  or 
greater  than  one  plus  the  number  of  weight  elements,  there  is  a  possi= 
bility  that  the  adaption  will  not  converge.  The  two  pattern  space, 
considered  in  section  3,  will  be  used  to  illustrate  this  limitation.  It 
can  be  seen  in  Figure  8  (a)  that  the  classes  can  be  separated  in  such  a 
way  that  the  analog  output  has  the  same  magnitude  for  each  pattern.  This 
is  not  true  in  the  case  of  the  configuration  of  Figure  8  (b),  and  the 
minimum  square  error  procedure,  as  described  above,  will  not  converge 
even  though  the  classes  are  linearly  separable. 

The  next  three  adaption  procedures  have  been  proved  to  converge  to 
a  solution  provided  that  the  pattern  classes  are  linearly  separable. 

These  schemes  compare  the  adaption  dead  zone  value  with  the  actual  analog 
output  and,  if  the  magnitude  of  the  analog  output  is  less  than  the  adap= 
tion  dead  zone  value,  some  method  will  be  employed  to  adjust  the  weights 
in  such  a  way  as  to  increase  the  analog  output  magnitude.  However,  if 
the  analog  output  magnitude  is  equal  to  or  greater  than  the  adaption  dead 
zone,  no  changes  are  made. 
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Fig«  8*  Separation  With  Minimum  Square  Error  Procedure® 
Incremental  Adaption 

When  the  analog  output  magnitude  is  less  than  the  adaption  dead 
zone  value,  all  the  weight  elements  are  changed  by  an  amount: 


A«i 


Incremental  Constant  x  Adaption  Dead  Zone 

Total  Number  of  Weight  Elements 


The  number  of  iterations  required  to  converge,  and  the  final  value  of  the 
weight  elements,  is  a  function  of  the  incremental  constant «  Note  that 
the  corrections  are  independent  of  the  errors. 
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Relaxation  Adaption 


This  procedure  is  similar  to  the  minimum  square  error  adaption 
except  that  no  corrections  are  made  when  the  analog  output  magnitude 
is  equal  to  or  greater  than  the  adaption  dead  zone®  The  adaption  pro¬ 
cess  will  converge  if  the  proportional  constant  is  between  zero  and  two® 
The  weight  elements  are  changed,  if  required*,  by  an  amounts 

1  |  _  (Adaption  Dead  Zone  -  Analog  Output)  x  Proportional  Constant 

r — wi I  “  Total  Number  of  Weight  Elements 

Modified  Relaxation  Adaption 

A  disadvantage  of  the  minimum  square  error  and  relaxation  adaption 
procedures  is  the  large  number  of  iterations  required  for  convergence® 
When  the  difference  between  the  desired  analog  output  and  the  actual 
analog  output  is  small,  a  small  correction  is  made®  Thus,  the  closer 
the  process  gets  to  the  solution,  the  smaller  the  magnitude  of  the 
corrective  changes®  The  modified  relaxation  adaption  procedure  over¬ 
comes  this  difficulty  by  correcting  to  a  value  larger  than  the  adaption 
dead  zone*  This  value,  usually  1®1  to  1®5  times  the  adaption  dead  zone 
magnitude,  is  defined  as  the  adaption  level®  No  corrections  are  made  if 
the  output  magnitude  exceeds  the  adaption  dead  zone®  The  equation  for 
the  change  of  weight  elements,  where  the  proportional  constant  should 
again  be  between  zero  and  two  for  convergance,  iss 

!  A  w  I  «  (Adaption  Level  -  Analog  Output)  x  Proportional  Constant 
M  Total  Number  of  Weight  Elements 
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8.  Evaluation  of  Adaption  Procedures. 

The  different  characteristics  of  the  adaption  procedures  were  in° 
vestigated  by  the  solution  of  some  typical  problems.  In  all  cases , 
Adaline  was  simulated  on  a  CDC  1604  Digital  computer  using  one  of  the 
adaption  procedures  which  are  defined  by  the  Fortran  statements  listed 
in  Appendix  I.  In  one  example  the  adaption  process  is  examined  after 
each  iterations,  (Appendix  II),  while  in  another  a  test  is  described  of 
Adaline's  ability  to  correctly  identify  a  number  of  patterns  upon  which 
it  had  not  been  trained  (Appendix  IV). 

If  training  patterns  are  not  separable,  the  training  process  will 

not  converge  and  must  be  arbitrarily  terminated.  The  Adaline  will  then 

* 

fail  to  identify  all  of  the  training  patterns  correctly,  but  if  the 
number  of  failures  is  small  this  may  be  tolerable  in  some  applications. 
When  it  comes  to  the  recognition  of  noisy  versions  of  the  training 
patterns,  it  must  be  expected  that  Adaline  will  only  recognize  a  sta= 
tistical  percentage  of  the  similar  patterns  presented.  The  only  method 
of  ensuring  that  Adaline  will  correctly  identify  all  possible  input 
patterns  is  to  train  on  all  the  conceivable  patterns.  3ut,  this  is 
then  a  form  of  ’’table  look  up”  recognition,  which  can  be  performed  by 
other  means  without  the  necessity  of  employing  an  iterative  scheme. 
Appendix  V  summarized  the  results  of  training  Adaline  on  pattern  classes 
that  are  not  separable. 

No  attempt  will  be  made  to  promote  the  use  of  one  adaptive  procedure 
in  lieu  of  the  others.  It  can  be  noted  that  the  modified  relaxation 
procedure  usually  requires  the  fewest  number  of  iterations  to  converge, 
but  results  in  a  wide  spread  in  the  analog  output  values.  On  the  other 
hand,  the  minimum  square  error  adaption  requires  more  iterations  to  con=> 
verge  but  has  a  narrow  spread  in  the  analog  output  values. 
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9.  Possible  Adaline  Applications* 

The  output  of  a  trained  Adaline  can  be  regarded  as  a  binary  digit*, 
or  logical  decision,  whose  value  depends  upon  the  pattern  input  to  the 
device.  It  follows*,  therefore*  that  an  Adaline  can*  in  principle,  be 
applied  in  any  situation  where  a  decision  is  to  be  made  on  the  basis  of 
some  "input"  to  the  decision  making  device.  In  particular*  the  Adaline 
concept  is  valuable  in  situations  where  performance  can  or  must  be 
improved  as  experience  accumulates. 

The  basic  difficulties  in  the  application  of  Adalines  relate  firstly 
to  the  problem  of  converting  an  input  into  a  pattern,  and  secondly  to  the 
development  of  a  suitable  training  procedure  which  will  ensure  that  the 
Adaline  does  in  fact  improve  its  performance  with  practice.  The  following 
three  sections  will  consider  a  few  of  the  possible  Adaline  applications. 
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10.  Servo-Mechanism  Controller 


A  single  Adaline  with  its  digital  output  can  be  used  as  a  bang-bang 
servo-mechanism  controller  as  in  figure  9. 


Fig.  9.  Basic  Adaline  Servo-Mechanism  Controller 

Graduate  students  at  Stanford  University  have  used  an  Adaline  in 
such  a  control  problem.  The  plant  consisted  of  a  rolling  cart  powered  by 
a  reversible  electric  motor.  Installed  on  the  cart  was  an  inverted 
pendulum.  The  Adaline  controller  was  trained  to  keep  the  pendulum  in  a 
vertical  position  without  extreme  excursion  of  the  cart  in  either  direc¬ 
tion.  Four  plant  variables  were  measured;  the  position  and  velocity  of 
the  cart  and  of  the  pendulum. 

The  direction  the  electric  motor  should  rotate,  and  thus  the  desired 
Adaline  digital  output,  is  a  function  of  the  four  measured  variables. 

The  value  of  each  variable  can  be  cataloged  into  one  of  several  distinct 
levels,  and  each  level  can  in  turn  be  represented  by  a  code  consisting  of 
a  series  of  pattern  elements.  These  pattern  codes  must  be  carefully 
chosen  to  ensure  that  the  pattern  classes  are  linearly  separable.  The 
complete  input  pattern  is  composed  of  the  pattern  elements  of  the  four 
variables. 

The  Adaline  is  trained  by  permitting  it  to  observe  the  performance 
of  another  type  of  controller.  The  ,,correct,,  response  is  then  available 
at  all  times  and  the  Adaline  weights  can  be  adjusted  to  bring  the  Adaline 
output  into  agreement.  After  the  training  is  completed,  the  Adaline  can 
take  over  the  operation  of  the  plant. 
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11.  Speech  Recognition. 

A  real  time  speech  recognition  system  r5j  has  been  constructed  at 
Stanford  University.  The  systems,  which  consists  of  several  parallel 
Adalines,  has  the  capability  of  converting  speech  into  typewritten  words. 
Since  the  operation  of  parallel  Adalines  will  be  discussed  in  a  later 
section,  only  the  coding  technique  will  be  discussed  here0 

The  main  problem  encountered  in  coding  was  the  choice  of  parameters 
to  describe  the  sound  of  a  spoken  word.  Bandpass  filters  were  employed 
to  separate  the  sound  energy  into  eight  frequency  bands  and  the  sound 
intensity  in  each  band  was  then  digitally  coded  according  to  the  ampli~ 
tude  level.  Four  levels  were  chosen  corresponding  to  the  three  bit 
patterns,  000,  001,  011,  or  111,  where  1  equals  +1  and  0  equals  =>1. 

Ten  samples  were  taken  during  the  utterance  of  a  word,  so  that  each 
filter  generated  30  pattern  elements.  The  complete  pattern  from  all 
eight  filters  therefore  consisted  of  240  pattern  elements. 
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12.  Weather  Forecasting. 

Adalines  have  been  applied  in  the  area  of  weather  forecasting  to 
the  extent  of  predicting  "fair"  or  ••rain’1  in  one  locality  [6J.  In 
this  application  parallel  Adalines  were  trained  to  interpret  weather 
maps.  In  particulars  they  were  trained  to  MreadM  surface  pressure  maps 
of  a  500,000  square  mile  area. 

The  weather  map  was  divided  into  48  regions  each  of  approximately 
600  square  miles.  Then,  the  expected  range  between  the  highest  and 
lowest  pressures  was  divided  into  ten  levels,  each  of  which  was  repre= 
sented  by  one  of  the  ten  digits,  0  through  9.  Thusa  each  input  pattern 
contained  48  pattern  elements  each  of  which  could  acquire  one  of  fen 
values  (as  compared  with  the  usual  two). 

The  results  obtained  from  the  Adalines  were  comparable  with  those 
obtained  from  "human”  weather  forecasters. 
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13.  Classification  When  Pattern  Classes  Are  Not  Linearly  Separable. 

Previous  sections  have  discussed  the  basic  structure  of  an  Adaline 
and  have  detailed  several  schemes  for  the  use  of  an  Adaline  in  pattern 
recognition  and  classification.  The  discussion  thus  far  has  been  limited 
to  the  separation  of  sets  of  input  patterns  into  classes  by  a  single 
Adaline.  It  was  found  that  the  classification  of  input  patterns  into 
two  classes  could  not  always  be  affected  with  a  single  Adaline.  This 
situation  was  illustrated  by  the  not  linearly  separable  example  shown  in 
figure  3.  In  that  example  it  was  desired  to  classify  all  input  patterns 
into  the  two  classes; 

(a)  both  pA  positive*  or  both  p^  negative^  and 

(b)  p^9s  with  alternate  signs 

This  separation  can  be  accomplished  by  a  system  using  two  Adalines  in 
parallel.  The  weights  of  the  two  Adalines  define  two  hyperplanes,  which 
affect  the  desired  separation  as  illustrated  in  figure  10.  The  overall 
system  consists  of  two  logic  layers 9  the  first  layer  being  composed  of 
Adalines  with  adaptive  elements  and  the  second  layer  made  up  of  a  fixed 
logic  element  or  threshold  device.  In  the  first  layer  each  Adaline 


Fig.  10.  Linear  Separability 
With  Multiple  Adalines 
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attempts  a  classification  of  input  patterns  into  linearly  separable 
classes  while  the  second  layer  combines  the  outputs  of  the  first  layer 
to  complete  the  desired  classification.  Using  the  example  of  figure  10, 
the  inputs  would  be  classified  in  the  first  logic  layer  as  follows: 

Adaline  One  a)  both  p.  positive 

b)  all  others 

Adaline  Two  a)  both  p^  negative 

b)  all  others 

This  classification  would  place  the  two  hyperplanes  as  in  figure  10. 

The  next  step  would  be  to  insert  the  Adaline  outputs  into  the  second 

1 

layer  which  is  an  Mor"  gate  (in  this  case)  in  order  to  realize  the 
desired  output  classification.  This  process  is  summarized  in  table  3. 


Table  3. 

Separation  Using  Two  Logic  Layers 

Inputs 

Outputs 

Adaline  #1 

Adaline  #2 

Or  Gate 

Desired 

i  i 

1 

-1 

1 

1 

i  -i 

-1 

-1 

-l 

-1 

-i  i 

-1 

-1 

-1 

-1 

-i  -i 

-1 

1 

1 

1 

It  is  apparent  that  the  choice  of  weights  (hyperplanes)  in  the 
first  level  Adalines  is  dependent  upon  the  logic  device  used  in  the 
second  layer  and  vice  versa.  It  follows  that  the  choice  of  a  logic 


device  and  the  establishment  of  a  training  procedure  for  the  Adalines 
may  prove  very  difficult  if  the  result  is  not  known  in  advance,  as  it 
was  here.  In  fact  this  would  seem  to  be  the  major  difficulty  in  this 

*An  MorM  gate  is  a  device  which  gives  a  positive  output  if  one  or 
more  of  its  n  inputs  are  positive. 
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scheme.  One  approach  Co  this  problem  now  follows. 

There  are  many  devices  which  could  be  used  in  the  second  layer  to 
combine  the  Adaline  outputs,  so  that  it  is  important  to  estimate  which 
of  these  devices  is  likely  to  be  the  best  one  for  this  purpose.  The 
previously  mentioned  "or"  element  can  be  regarded  as  a  special  quantizer 
whose  output  is  -1  unless  at  least  one  Adaline  output  is  positive.  At 
the  other  extreme  would  be  an  element  whose  output  is  -1  unless  all  n 
Adaline  outputs  are  positive. 

In  trying  to  find  the  "best"  quantizing  device  for  use  in  the  second 

logic  layer  it  seems  natural  to  try  a  device  whose  output  becomes  +1  when 

about  n  of  the  Adaline  outputs  become  positive.  It  has  been  found,  f7 ;  , 

2 

that  for  a  system  with  an  odd  number  of  Adalines,  a  simple  output 

majority,  or  a  second  layer  "threshold”  of  n+L ,  will  realize  the 

2 

classification  of  the  greatest  number  of  inputs.  Similarly  for  an  even 

number  of  Adalines,  second  layer  thresholds  of  n  or  n+2  will  realize  the 

2  2 

highest  percentage  of  classifications.  It  should  be  noted  that  the 
criterion  used  in  the  above  reference  to  determine  the  "best"  second 
layer  threshold  for  general  use  was  the  criterion  of  the  classification 
of  the  maximum  number  of  input  pattern  sets.  However,  for  any  specific 
patterns,  or  sets  of  patterns,  the  threshold  which  is  "best"  might  be 
anywhere  from  1  to  n. 

There  is  still  the  problem  of  choosing  the  "best"  adaption  scheme 
for  the  first  layer.  One  method  that  has  been  suggested  makes  use  of 
both  the  analog  output  data  from  each  Adaline  and  the  digital  output 
desired  from  the  overall  system.  In  the  case  where  the  second  layer 
element  is  an  "or"  gate,  the  procedure  to  be  followed  will  depend  on 
the  desired  system  output.  Thus,  if  the  desired  system  output  is  -1, 
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all  Adalines  must  give  a  negative  output  and  any  Adaline  which  is  giving 
a  positive  output  will  have  to  be  adapted.  However,  when  all  first  layer 
outputs  are  negative  and  the  desired  system  output  is  +1,  only  one 
Adaline  need  by  adapted.  In  this  case  it  has  been  suggested  that 
adaption  be  confined  to  the  Adaline  whose  analog  output  needs  the  small- 
est  change  to  establish  the  required  condition.  If  the  second  layer 
threshold  is  a  majority  logic  device,  the  same  general  procedure  will  be 
followed.  If,  to  obtain  the  desired  system  output,  the  output  of  at 
least  k  Adalines  must  be  revised,  the  k  Adalines  whose  outputs  require 
the  least  change  will  be  adapted.  The  idea  behind  this  procedure  is  that 
adaption  should  take  place  with  the  minimum  of  disturbance  to  the  pre¬ 
viously  established  pattern  of  weights. 

It  may  be  noted  that  if  it  were  not  for  the  difficulty  in  choosing 
a  suitable  logic  for  the  second  layer,  and  a  training  procedure  for  the 
first,  it  would  be  theoretically  possible  to  establish  any  desired 
classification  if  sufficient  Adalines  were  employed  in  the  first  layer. 

In  an  extreme  case,  for  example,  the  n+1  hyperplanes  defined  by  the 
weights  of  n+1  Adalines  could  separate  one  input  from  all  others  in 
n-space  i_f  the  weights  could  be  chosen  properly,  and  iJE  the  second  logic 
layer  properly  interpreted  the  outputs  of  the  Adalines. 
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14.  Madaline. 


Multiple  nda lines  in  parallel,  or  a  Madeline  as  this  combination  is 
sometimes  called,  may  also  be  used  to  classify  a  set  of  input  patterns 
into  more  than  two  output  classes.  If  the  input  patterns  can  oe 
appropriately  separated  by  each  of  the  Adelines,  Madaline  can  accomplish 
the  desired  classification  with  the  help  of  a  digital  output  coding 
scheme  such  as  that  illustrated  in  figure  11  and  table  4.  It  is  easily 
seen  that  the  maximum  number  of  individual  output  codes  available  is  2^ 
where  m  is  the  number  of  Adalines  in  the  Madaline.  As  m  =  3  in  figure  11 
and  table  4,  it  is  readily  apparent  that  eight  input  patterns  or  pattern 
sets  can  be  classified  into  eight  different  digital  output  combinations 
in  this  situation. 

Here  the  coding  scheme  is  predetermined,  so  that  the  training 
follows  the  procedure  devised  for  a  single  Adaline.  That  is,  each 
Adaline  is  trained  individually  to  generate  the  appropriate  response  to 
each  of  the  training  patterns. 

The  problem  which  may  arise  is  related  to  the  choice  of  codes  for 
the  several  pattern  sets.  If  these  are  not  properly  chosen,  then  the 
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individual  Adalines  may  well  be  attempting  to  separate  the  patterns  into 
classes  which  are  not  linearly  separable.  A  change  in  the  codes  may 
eliminate  the  problem  in  any  given  situation,  but  no  systematic  pro- 
cedure  for  choosing  the  coding  scheme  has  been  found. 

Also  to  be  considered  is  the  fact  that  the  coding  scheme  may  affect 
the  number  of  Adalines  needed,  and/or  the  number  of  adaptions  necessary 
during  training.  These  considerations,  however,  seem  to  be  less  im¬ 
portant  than  that  of  the  previous  paragraph. 

For  the  example  of  figure  11  all  eight  possible  binary  output  codes 
are  used.  Note,  however,  that  there  are  many  possible  choices  for  the 
code  to  be  assigned  to  each  pattern  or  pattern  set.  One  such  choice  is 
shown  in  table  4. 


Table  4.  Typical  Madaline  Output  Coding  Scheme 

Adaline  1 
Output 

Adaline  2 

Output 

Adaline  3 
Output 

pattern  1 

1 

-1 

1 

pattern  2 

-1 

1 

-1 

pattern  3 

1 

-1 

-1 

pattern  4 

-1 

1 

1 

pattern  5 

1 

1 

“  1 

pattern  6 

-1 

-1 

1 

pattern  7 

1 

1 

1 

pattern  8 

-1 

-1 

-1 

To  summarize,  each  of  the  Adalines  is  trained  to  separate  the  input 
patterns  into  two  classes  (a)  +1  output  and  (b)  -1  output.  Then,  as  an 
input  pattern  is  imposed  on  the  Madaline,  the  Adalines  simultaneously 
determine  their  output  responses.  The  outputs  can  then  be  examined  to 
properly  classify  the  imposed  pattern.  The  classification  of  the 
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written  digits,  one  through  eight,  suggested  herein  was  successfully 
accomplished  using  three  computer-simulated  seven  by  seven  Adalines. 
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15.  Character  Recognition 

In  this  section  the  application  of  a  Madaline  to  the  recognition  and 
classification  of  hand  printed  letters,  numbers  and  other  punctuation 
symbols  will  be  discussed.  For  definiteness,  the  alpha-numeric  character 
set  chosen  corresponded  to  that  used  in  Fortran  computer  coding,  and  the 
aim  was  to  evaluate  the  possibility  of  ,,readins,,  hand  printed  Fortran 
symbols  using  a  Madaline. 

The  investigation  was  divided  into  two  parts.  The  first  consisted 
of  an  evaluation  of  the  feasibility  of  classifying  the  ten  numeric 
characters  using  a  Madaline,  while  the  second  consisted  of  an  attempt 
to  classify  45  Fortran  characters  using  a  similar  device. 

In  the  first  part  of  the  investigation  the  Madaline  consisted  of 
four  Adalines,  the  minimum  number  required,  each  with  a  seven  by  seven 
input  space.  The  first  tests  were  conducted  using  patterns  of  ten  digits 
obtained  from  five  different  persons,  each  of  whom  wrote  one  sample  of 
each  of  the  ten  digits  on  a  standard  Fortran  coding  form.  The  written 
digits  were  enlarged  by  projection  on  a  screen  and  a  seven  by  seven  grid 
form  was  used  to  determine  the  actual  input  patterns  used.  Each  test 
digit  was  centered  on  the  seven  by  seven  grid  form  and  a  digit  was  de¬ 
fined  to  be  "inM  a  grid  space  if  it  entered  in  such  a  manner  that  both 
sides  of  the  line  could  be  seen  inside  that  space.  The  tests,  which  are 
detailed  in  Appendix  VI,  were  conducted  using  the  patterns  obtained  in 
this  manner.  In  one  of  these  tests,  the  Madaline  was  trained  four  times 
on  each  of  the  fifty  different  input  patterns  and  was  then  asked  to 
classify  each  input  pattern.  All  input  patterns  were  classified 
correctly. 

The  second  part  of  this  investigation  was  again  conducted  with  the 
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minimum  number  of  Adalines  required.  Six  Adalines,  each  with  a  seven  by 


seven  input  space,  were  used  to  classify  the  45  Fortran  symbols, 
results  of  this  classification  are  detailed  in  Appendix  VII, 


The 
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16.  Summary  and  Acknowledgements. 

This  paper  contains  a  survey  of  the  pattern  recognition  problem  as 
approached  through  the  application  of  adaptive  linear  neuron  devices. 

Some  of  the  basic  Adaline  concepts  were  verified  experimentally  and  some 
concepts  were  amplified.  In  particular,  an  example  was  shown  of  the 
inability  of  the  minimum  square  error  adaption  procedure  to  separate  in- 
put  patterns  that  were  linearly  separable,  and  a  method  was  developed 
to  display  the  convergence  characteristics  of  the  individual  adaption 
procedures.  In  addition,  preliminary  tests  to  determine  the  feasibility 
of  utilizing  a  Madaline  for  character  recognition  indicated  that  a  pos- 
sible  application  exists  in  this  field. 

An  interesting  project,  which  is  a  natural  follow  up  of  the  work  de¬ 
tailed  herein,  would  be  that  of  using  an  actual  laboratory  set  up  to  more 
completely  determine  the  feasibility  of  utilizing  Madaline  for  character 
recognition.  Perhaps  this  could  be  accomplished  using  photocells  for 
input  pattern  detection.  Another  carry  over  project  using  Adalines  would 
be  to  continue  the  weather  forecasting  analysis  previously  discussed  (jQ. 
The  wealth  of  weather  data  available  at  this  location  makes  this  a  par¬ 
ticularly  feasible  project. 

The  authors  wish  to  acknowledge  the  guidance  and  assistance  given 
to  them  by  Dr.  J.  R.  Ward. 
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APPENDIX  I 


FORTRAN  STATEMENTS  OF  ADAPTION  PROCEDURES 
The  Fortran  statements  used  to  simulate  the  four  adaption  procedures 
are  listed  below.  The  program  was  written  for  an  Adaline  having  17 
weight  elements.  It  is  to  be  noted  that  this  is  not  the  complete  program 
but  only  the  adaption  techniques. 

Abbreviations  used  in  the  program: 


ADAPT 

Adaption  level 

BETA 

Increment  constant 

BMSE 

Bound  for  minimum  square  error 

DELTA 

Dead  zone  level 

PAT(I,J) 

Input  pattern,  I “pattern  element,  J=input  pattern 
number 

PIE 

Proportional  constant 

SGN(J) 

Desired  binary  output  of  pattern  J 

SUM 

Analog  output  of  pattern  J 

WEY(I) 

Value  of  weight  element  I 

C  MINIMUM  SQUARE  ERROR  ADAPTION 
ERROR=SGN ( J ) *DELTA - S  UM 
A3ER=ABSF( ERROR) 

IF(ABER-BMSE)  100,100,50 
50  ENORM=PIE*(ERROR/17.0) 

DO  60  1=  1,17 

60  WEY(I)=WEY(I)+ENORM*PAT(I,J) 
100  CONTINUE 
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C  INCREMENTAL  ADAPT ION 

IF(SGN( J) )  256,254,254 

254  IF(SUM)  258,258,255 

255  IF(SUM-DELTA)  258,262,262 

256  IF(SUM)  257,259,259 

257  IF(SUM+DELTA)  262,262,259 

258  SIGN=+L oO 
GO  TO  260 

259  SIGN=~1 ®0 

260  DO  261  1=1,17 

261  WE Y ( I ) =W E Y ( I ) +S IG N*B ET A*PAT ( I , J ) 

262  CONTINUE 

C  RELAXATION  ADAPTION 

IF(SGN( J) )  306,304,304 

304  IF(SUM)  310,310,305 

305  IF(SUM+o0005=> DELTA)  310,312,312 

306  IF(SUM)  307,310,310 

307  IF(SUM~ «0005+DELTA)  312,312,310 

310  ERROR=SGN( J)*DELTA“SUM 
ENORM=PI E* ( ERROR/ 1 7 » 0) 

DO  311  1=1,17 

311  WEY ( I) =WEY ( I) +EN0RH*PAT ( I , J) 

312  CONTINUE 
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C  MODIFIED  RELAXATION  ADAPTION 

IF(SGN( J) )  356,354,354 

354  IF(SUM)  360,360,355 

355  IF (SUM- DELTA)  360,362,362 

356  IF(SUM)  357,360,360 

357  IF (SIT-1+ DELTA)  362,362,360 

360  ERROR=3GN ( J) *ADAPT - S UM 
ENORM-P I E*( ERROR / I 7 . 0 ) 

DO  36 1  1=1,17 

361  WEY ( I ) =WE Y ( I) +ENORM*PAT ( I , J ) 

362  CONTINUE 
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APPENDIX  II 


EXAMPLES  OF  ADAPTION  PROCEDURES 
An  example  of  pattern  separation  was  solved  using  each  of  the 
four  different  adaption  procedures  to  illustrate  their  characteristics. 
The  test  patterns  were: 


(+D 


.  X  .  X 
9  -  X  9 
9  X  9  X 

9  o  «  e 

(2) 


X  9  X  . 
9X99 

X  9  X  . 
(3) 


0  0  0  0 


9  X  9  X 


9  X  9  X 
(4) 


(-1) 


09X9 

X  9  X  9 
X  X  X  . 

9  9  9  9 

(5) 


9  9  9  X 
9  X  9  X 
.XXX 

9  9  9  9 

(6) 


e.X. 
X  9  X  9 
XXX. 


(7) 


0  0  0  0 

9  .  9  X 


X  9  X 
XXX 
(8) 


When  a  pattern  is  imposed  on  an  Adaline,  corrections  are  made  to  the 
weights  so  that  its  analog  output  satisfies  the  training  criteria.  This, 
however,  has  a  tendency  to  partially  destroy  some  of  the  effects  of 
previous  training.  In  an  attempt  to  illustrate  this  process,  the  analog 
output  of  the  Adaline  was  examined  after  each  adaption  throughout  this 
test. 


The  fixed  conditions  of  this  experiment  were: 

Weights:  Continuously  variable  with  initial  values  of  0.0 

Pattern  sequences  1,  2,  3,  4,  5,  6,  7,  8 

Proportional  constants  1,00 

Minimum  square  error  adaption  levels  30.00 

Minimum  square  error  adaption  bounds  +  1.00 

Incremental  constants  1.00 

Adaption  dead  zones  30.00 

Adaption  levels  40.00 
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Minimum  Square  Error  Adaption 


Number 

of  iterations 

required  to  converges 

64 

Weight 

elements  after 

convergence. 

3.436 

7.864 

10.197  -5.141 

-5.773 

-.725 

9.090  .228 

-11.528 

-17.821 

-12.380  -16.595 

-15.091 

-.400 

-3.237  -9.718 

4.442 

Analog  output  of 


each  pattern  after 
Pattern 
1 
2 

3 

4 

5 

6 

7 

8 


convergence. 

Analog  output 
29.819 
29.385 
29.031 
29.368 
=29.846 
-29.529 
-30.063 
-30.000 
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Incremental  Adaption 


Number 

of  iterations  required 

to  converges  82 

Weight 

elements  after  convergence. 

4.000 

10.000  14.000 

“6.000  “6.000 

-2.000  10.000 

.000  “14.000 

-22.000  “14.000 

“22.000  “16,000 

-2.000  “10.000 

“12.000  6.000 

>g  output  of  each  pattern  after 

convergence. 

Pattern 

Analog  output 

1 

30.000 

2 

46.000 

3 

30.000 

4 

30.000 

5 

-42.000 

6 

-34.000 

7 

-46.000 

8 

=30.000 
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Relaxation  Adaption 


Number 

of  iterations 

required  to  converges 

168 

Weight 

elements  after 

convergence. 

3.540 

7.881 

10.304  =5.277 

=5.918 

=  .730 

9.141  .285 

=11.706 

=18 . 106 

-12.534  =16.846 

=15.344 

-.401 

-S.312  -9.827 

4.516 

Analog  output  of  each  pattern 
Pattern 
1 
2 

3 

4 

5 

6 

7 

8 


after  convergences 

Analog  output 
30* 000 
30.000 
30.000 
30.000 
=30.000 
>30 . 000 
“30.000 
=•30.000 


Notes  The  criterion  of  the  relaxation  adaption  procedure  is  that  the 
magnitude  of  .the  analog  output  for  each  pattern  be  equal  to  or  greater 
than  the  adaption  dead  zone.  In  this  particular  example  the  magnitude 
of  each  pattern  analog  output  after  training  was  exactly  equal  to  adap= 
tion  dead  zone.  This  would  not  normally  be  expected  . 
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Modified  Relaxation  Adaption 


Number 

of  iterations  required 

to  converges  31 

Weight 

elements  after  convergence. 

3.896 

9.918  12.819 

-5.197  =6.787 

-1.319  9.970 

.057  -13.871 

-20.536  -14.237 

-20.485  -16.814 

-1.542  -9.604 

-11.569  5.735 

>g  output  of  each  pattern  after 

convergence. 

Pattern 

Analog  output 

1 

34.697 

2 

37.435 

3 

30.138 

4 

30.844 

5 

-36.076 

6 

-37.091 

7 

=40.000 

8 

=33.365 
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APPENDIX  III 


MINIMUM  SQUARE  ERROR  ADAPTION  AS  A  FUNCTION  OF  PROPORTIONAL  CONSTANT 


An  investigation  of  minimum  square  error  adaption  was  conducted  to 
determine  the  effect  of  the  proportional  constant.  The  number  of  iter° 
ations  required  for  convergence  and  the  resultant  value  of  the  weight 
elements  were  examined.  Fixed  conditions  of  the  experiment  were? 

Adaption  levels  30.00 

Bound  for  minimum  square  errors  +1.00 

Weights s  Continuously  variable  with  initial  values  of  0.0 

Pattern  sequences?  A.  1,  2,  3,  4,  5»  6S  7 9  8 

B.  1)  5 s>  2j j  3)  7 ,  4,  8 

C.  Random 


Training  patterns 

X  6c  J  X 

(+D  X 


.  X 

X  . 

0  X 


e  e  0  0 

Cl) 

.  .  X  . 
X  .  X  . 


.  X  .  X 
«  ©  X  . 
.  X  .  X 

©OOO 


(2) 


o  e  e  X 
.  X  .  X 


OOO. 

X  o  X  . 
O  X  0  6 
X  O  X  0 
(3) 

OOOO 


0000 

o  X  0  X 
•  O  X  6 
0  X  e  X 
(4) 

0000 

0  0  o  X 
X 


(-1) 

X  X  X  . 

.  X  X  X 

X  0  X  0 

0  X  0 

OOOO 

0  0  0  0 

X  X  X  0 

.  X  X 

(5) 

C6) 

C7) 

C8) 

C  6c  T 

X  X  X  0 

0  X  X  X 

0  0*0 

XXX 

X  0  0  0 

0  0  0  X 

X  0  0  X 

X  0  0 

(+1) 

X  0  0  0 

0  0  0  X 

X  0  0  X 

X  0  0 

X  X  X  • 

0  X  X  X 

X  X  X  X 

OOO 

Cl) 

C2) 

C3) 

C4) 

X 

X 


(-1) 


X  X 
X  0 
X  0 
X  0 
(5) 


o  «  0  X 
X  X  X  X 
•  o  0  X 
0  0  0  0 
(6) 


0  0  0  0 
X  0  0  0 
X  X  X  X 
X  0  0  o 
(7) 


0  X  e 

0X0 
0  X  e 
XXX 
(8) 
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Training  patterns 


X  &  0 

(  +  D 


(-D 

U  6c  T 
(+1) 


(-D 


X  .  X  . 
O  X  0  0 
X  .  X  . 

OO.0 


X  .  .  X 
X  .  .  X 
X  .  .  X 
X  X  X  X 
(1) 

X  X  X  X 
.XX. 
.XX. 
.XX. 
(5) 


.  X  .  X 

o  o  X  0 

•  X  o  X 

•  000 


X  X  X  X 
X  0  O  0 

X  o  o  o 

X  X  X  X 

(2) 

o  o  e  X 
X  X  X  X 
X  X  X  X 
O  0  0  X 
(6) 


0  0  0  0 
X  .  X  . 
O  X  .  0 
X  .  X  . 


X  X  X  X 
X  .  .  X 
X  .  .  X 
X  .  .  X 
(3) 

.XX. 
.XX. 
.XX. 
X  X  X  X 
(7) 


0  0  0  0 
.  X  .  X 
0  0  X  0 
.  X  .  X 


(1) 

(2) 

(3) 

(4) 

XXX. 

.XXX 

0  0  0  0 

0  0  0  0 

X  .  X  . 

.  X  .  X 

XXX. 

.XXX 

XXX. 

.XXX 

X  »  X  . 

.  X  .  X 

0*00 

0  0  0  0 

XXX. 

.XXX 

(5) 

(6) 

(7) 

(8) 

XXX 
0  .  0 
0  0. 
XXX 
(4) 

X  o  e  o 
X  X  X  X 
X  X  X  X 
X  .  0  0 
(8) 
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X  X  X  X 


NUMBER  OF  ITERATIONS  REQUIRED  TO  CONVERGE  FOR  MINIMUM 

ERROR  ADAPTION 


PROPORTIONAL  CONSTANT 


PATTERN 

SEQUENCE 

.25 

o 

© 

.75 

1.00 

1.25 

1.50 

1.75 

1.90 

2.00 

X&J 

A 

434 

188 

104 

64 

42 

72 

162 

455 

it 

B 

423 

182 

109 

66 

49 

71 

162 

434 

* 

C 

440 

200 

136 

96 

72 

99 

160 

392 

it 

C&T 

A 

143 

68 

37 

47 

35 

52 

132 

353 

it 

B 

149 

68 

46 

49 

41 

49 

123 

343 

it 

C 

139 

72 

56 

40 

72 

80 

168 

383 

it 

x&o 

A 

297 

129 

73 

45 

45 

69 

122 

360 

it 

B 

300 

144 

84 

54 

47 

66 

145 

367 

it 

C 

312 

152 

96 

72 

72 

85 

143 

366 

it 

U&T 

A 

96 

55 

35 

27 

37 

58 

92 

259 

it 

B 

97 

51 

35 

31 

45 

60 

149 

397 

it 

C 

104 

56 

40 

45 

40 

72 

120 

280 

it 

*  denotes  nonconvergenee 
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RESULTANT  WEIGHT  ELEMENTS  AFTER  CONVERGENCE  FOR  X  &  J  PATTERNS 


PROPORTIONAL 

CONSTANT 

PATTERN 

SEQUENCE 

WEIGHT  VALUES  AFTER 
CONVERGENCE 

.25 

A 

3.454 

7.661 

10.010 

=5.114 

-5.758 

■=.737 

8.868 

.265 

=11.380 

=17.576 

=12.174 

=16.368 

-14.S92 

=  .425 

=8 .074 

=9.559 

4.389 

1.00 

A 

3.436 

7.864 

10.197 

=5.141 

=5.773 

=  .725 

9.090 

.228 

=11.528 

=17.821 

=12.380 

=16.595 

=15.091 

=  .400 

=8.237 

=9 . 7 18 

4.442 

1.90 

A 

3.519 

7.936 

10.373 

=5.306 

=5.979 

=  .859 

9.161 

.202 

=11.658 

“18.137 

=12.682 

=16.912 

=15.342 

=  #448 

=8.136 

=0.811 

4.694 

.25 

B 

3.446 

7.622 

9.991 

=5.137 

=5.742 

=  .713 

8.869 

.285 

=11.352 

=17.575 

=12.162 

=16.328 

=14.895 

=  .394 

-8.059 

=9.548 

4.380 

1.00 

B 

3.406 

7.670 

10.244 

=5.334 

=5 .698 

=  .663 

9.199 

© 

00 

00 

=11.464 

=17.932 

=12.542 

=16.403 

=15.285 

=  .259 

=8.121 

=9.846 

4.478 
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PROPORTIONAL 

CONSTANT 

PATTERN 

SEQUENCE 

WEIGHT  VALUES  AFTER 
CONVERGENCE 

1.90 

B 

3.569 

7.950 

10.375 

=5.356 

=6.014 

•  &  9  20 

9.144 

.220 

=11.739 

«18  ® 160 

=12.729 

=16.966 

=15.378 

°  ®4l8 

-8.154 

=9.782 

4.650 

.25 

C 

3.426 

7*672 

10.099 

=5.116 

-5.740 

0 

© 

to 

o 

8.878 

.259 

=11.357 

-17.573 

-12.172 

=16.367 

=14o878 

=  .389 

=3.063 

=9.527 

4.394 

1.00 

C 

3.467 

7.803 

10.130 

=5.193 

=5.853 

—  .674 

9.104 

.290 

=11.560 

=17.826 

=12.455 

=16.525 

=15.247 

-.310 

=8.210 

=9.705 

4.423 

1.90 

C 

3.657 

7.762 

10.126 

-5.357 

=5.943 

-.775 

9.026 

.270 

=llo689 

=18.095 

-12.358 

-16.831 

=15.239 

-.394 

=3.358 

=9.690 

4.380 
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COMMENTS 


1.  The  proportional  constant  strongly  affects  the  number  of  iterations 
required  to  convergee  A  minimum  number  of  iterations  was  required  when 
the  proportional  constant  was  approximately  equal  to  one0  The  adaption 
process  will  not  converge  if  the  proportional  constant  is  equal  to  two» 

2.  For  each  pattern  pair*  the  final  values  of  the  weight  elements  were 
approximately  the  same  regardless  of  the  proportional  constant  chosen 

or  the  sequence  of  the  training  patternso  Only  a  representative  sampling 
of  resultant  weights  are  included  in  the  data9  but  all  the  results  sup° 
ported  the  above  conclusion^ 
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APPENDIX  IV 


ADALINE  RESPONSE  TO  SIMILAR  PATTERNS 


An  Adaline  was  trained,  until  it  had  converged  to  a  solution^,  on  the 
ten  patterns  shown  below.  After  convergences,  50  ’’similar"  patterns  were 
imposed  upon  the  trained  Adaline,  the  "similar"  patterns  being  generated 
by  randomly  changing  one  pattern  element  of  the  training  patterns .  This 
test  was  conducted  using  each  of  the  four  adaption  procedures  in  turn<> 
Training  Patterns 


+1  Class 


X  ,  .  X  . 

.XX.. 

.  X  X  .  o 

X  .  .  X  . 


(1) 


0  o  o  o  o 

X  .  .  X  . 
.  X  X  .  o 
.XX.. 
X  .  .  X  . 
(3) 


•1  Class 


.XX. 
.XX. 
X  o  .  X 
(4) 


•  O  0  X  « 

o  o  o  o  X 

O  O  O  O  0 

O  O  0  o 

0 

O  O  0  X  0 

O  O  O  0  X 

0  O  O  X  0 

O  O  0  o 

X 

X  •  0  X  0 

V  X  o  o  X 

0  .  0  X  0 

0  0  0  0 

X 

X  X  X  X  o 

o  X  X  X  X 

X  .  .  X  . 

O  X  .  0 

X 

•  •000 

0  0  0  0  0 

X  X  X  X  . 

.XXX 

X 

(6) 

(7) 

(8) 

(9) 

The  fixed 

conditions  of  the 

experiment  weres 

0  0  0  0 


X  X  X  X 

(10) 


X 

X 

X 

X 

X 


Weights:  Continuously  variable  with  initial  values  of  0.0 


Pattern  sequences  1»  2,  3,  4,  5,  6,  7,  8,  9,  10 


Proportional  constants  1.00 


Minimum  square  error  bounds  +  1.00 
Adaption  dead  zones  30.00 


Incremental  constants  1.00 


Adaption  levels  40.00 
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Adaption 

Procedure 

Number  of 

It erations 

Range  of  Analog 
Output  for  Train¬ 
ing  Patterns 

Range  of  Analog 
Output  for  Similar 
Patterns 

+ 

<=> 

+ 

Minimum 

Square 

Error 

103 

30.128 

29.587 

29.076 

30.837 

35.657 

6.508 

19.268 

38.684 

Incre¬ 

mental 

29 

62.000 

36.000 

36.000 

52.000 

58.000 

18.000 

18.000 

64.000 

Relaxation 

99 

42.274 

30.000 

30.000 

38.701 

43.564 

17.171 

17.099 

49.652 

Modified 

Relaxation 

19 

52.642 

30.534 

35.249 

46.580 

55.708 

17.156 

17.367 

57.823 
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COMMENTS 


The  range  of  the  analog  outputs  for  tne  similar  patterns  is  de° 
pendent  upon  the  final  trained  values  of  the  weight  elements*  In  this 
example,  one  pattern  element  was  changed  in  the  generation  of  each  simi=> 
lar  pattern  and  this  had  the  effect  of  changing  the  sign  of  one  corres° 
ponding  weight  element*  Thus  the  largest  change,  in  this  case,  would 
occur  when  the  pattern  bit  corresponding  to  the  largest  weight  element 
is  changed  in  sign* 

All  of  the  analog  outputs  of  the  similar  patterns  were  of  the  same 
sign  as  the  desired  binary  output*  However,  if  similar  patterns  had 
been  formed  by  changing  more  than  one  pattern  element,  it  is  possible 
that  some  of  the  similar  analog  outputs  would  be  of  a  different  sign 
than  the  desired  pattern  binary  output* 
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APPENDIX  V 


THE  RESPONSE  OF  AN  ADALINE  TO  TRAINING  PATTERNS  THAT  ARE  NOT  LINEARLY 

SEPARABLE 

Two  pattern  classes  were  chosen  so  that  they  were  not  linearly  sepa° 
rable.  This  was  accomplished  by  assigning  the  same  pattern  to  both  the 
(+1)  and  the  (=1)  class.  There  were  a  total  of  100  input  patterns^  each 
of  which  was  composed  of  nine  pattern  elements.  The  training  patterns 
used  in  this  experiment  are  tabulated  in  JYj » 

The  experiment  consisted  of  two  parts »  the  first  consisting  of  an 
attempt  to  separate  the  two  pattern  classes  by  imposing  all  the  training 
patterns  on  the  Adaline.  This  was  performed  for  both  500  and  5000  iter° 
ations.  Seconds  a  random  sample  of  ten  patterns  was  chosen  and  Adaline 
was  trained  on  these  patterns  for  200  iterations  or  until  it  had  converged 
to  a  solution.  Then  the  remaining  90  patterns  were  imposed  upon  the 
trained  Adaline.  The  fixed  conditions  for  this  experiment  are  the  same 
as  those  listed  in  Appendix  IV. 

The  results  are  listed  in  the  following  table.  For  this  experiments, 
a  pattern  was  defined  to  be  not  correctly  identified  if  its  analog  output 
was  either  of  an  opposite  sign  to  the  desired  pattern  binary  output „  or 
if  the  analog  output  was  zero. 
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Number  of  Patterns  Not  Correctly  Identified 


Adaption  Procedures 

Number  of 

Training 

Patterns 

Number  of 
Iterations 

Mlno  Sq« 
Error 

InCo 

Relaxation 

Modified 

Relaxation 

100 

500 

13 

4 

1 

7 

100 

5000 

13 

2 

5 

5 

10* 

200 

n 

9 

9 

9 

10* 

200 

14 

10 

10 

10 

10* 

200 

14 

9 

10 

9 

10* 

200 

22 

i 

11 

11 

7 

1 _  ...  _ 

*  Different  random  samples  of  ten  patterns,, 
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APPENDIX  VI 


NUMERIC  RECOGNITION  TESTS 


A  computer  simulated  Madaline  consisting  of  four  Adalines  was  em¬ 
ployed  to  attempt  the  separation  of  the  ten  numeric  characters  into  the 
ten  different  classes,  zero  through  nine®  Ten  hand  written  digits  (one 
set)  were  obtained  from  each  of  five  persons 9  and  were  converted  into 
patterns  in  a  seven  by  seven  input  space-  The  computer  program  was 
written  in  Fortran  and  executed  on  a  CDC  1604  digital  computer  using  the 
Minimum  Square  Error  adaption  scheme  with  a  proportionality  constant  of 
one  and  an  adaption  dead  zone  of  30»  The  following  output  coding  scheme 
was  used  in  this  investigations 

Digit  Adaline  1  Adaline  2  Adaline  3  Adaline  4 


0  1  1 


1 


1 


1  -1 


<=>  1 


-I 


1 


2  1  =1 


=  1 


3 

4 

5 

6 

7 

8 
9 


A  series  of  tests  were  run  in  which  one9  two  or  more  sets  of 
patterns  were  used  to  train  the  Madaline9  after  which  the  Madaline  was 
asked  to  recognize  and  classify  all  fifty  input  patterns-  The  number 
of  training  iterations  performed  prior  to  the  classification  check9  and 
the  number  of  digits  recognized  out  of  the  total  of  fifty  imposed  on 
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the  system  are  included  in  the  data  tables  below. 


Noo  of  patterns 

trained  on 

. 

No®  of  training 
iterations 

No®  of  patterns 
checked 

Woo  q£  patterns 
classified 

1  set  of  10 

4  iterations 
per  pattern 

50 

36 

2  sets  of  10 

4  iterations 
per  pattern 

50 

36 

3  sets  of  10 

4  iterations 

per  pattern 

50 

44 

4  sets  of  10 

4  iterations 
per  pattern 

50 

45 

5  sets  of  10 

4  iterations 
per  pattern 

50 

1 

50 

No®  of  patterns 
trained  on 

No®  of  training 
iterations 

No®  of  patterns 
checked 

i 

No®  of  patterns 
classified 

1  set  of  10 

200  total 
iterations 

50 

3 1 

2  sets  of  10 

200  total 
iterations 

50 

38 

3  sets  of  10 

200  total 
iterations 

50 

44 

4  sets  of  10 

200  total 
iterations 

50 

44 

5  sets  of  10 

200  total 
iterations 

50 

50 

The  results  of  this  test  show  that  the  system  must  be  trained  on 


all  patterns  to  insure  that  it  will  be  capable  ©f  classifying  all  the 
patterns®  However9  training  on  just  one  set  gave  the  system  the  capa= 
bility  of  recognizing  many  of  the  other  patterns  submitted  to  it  for 
classification®  Of  interest  is  the  suggestion  that  repeated  iterations 
do  not  necessarily  improve  the  ability  of  the  system  to  classify  the  im-= 
posed  inputs® 
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APPENDIX  VII 


FORTRAN  SYMBOL  CLASSIFICATION 

The  sole  purpose  of  this  test  was  to  determine  whether  or  not  45 
hand  printed  Fortran  symbols  could  be  separated  by  a  computer  simulated 
Madaline  consisting  of  the  minimum  number  of  Adalines  that  could  theo« 
retically  accomplish  this  task.  Six  Adalines,  each  with  a  seven  by 
seven  input  spaces  were  utilized  to  accomplish  the  desired  separation 
after  the  hand  written  characters  had  been  converted  into  input  patterns. 
The  computer  program  was  written  in  Fortran  and  executed  on  a  CDC  1604 
digital  computer  using  the  Minimum  Square  Error  adaption  technique.  A 
proportionality  constant  of  one  and  an  adaption  dead  zone  of  30  were 
used  for  this  test. 

After  extensive  manipulation  of  the  Madaline  output  coding  schemes 8 
the  45  characters  were  properly  classified.  Nine  thousand  training 
iterations  were  executed  prior  to  the  classification  of  the  45  input 
patterns.  The  45  Fortran  symbols  and  the  output  coding  scheme  which 
accomplished  the  desired  classification  are  as  follows? 


Pattern 

Adaline 

One 

Adaline 

Two 

Adaline 

Three 

Adaline 

Four 

Adaline 

Five 

Adaline 

Six 

0 

-1 

1 

1 

1 

=  1 

1 

1 

1 

1 

1 

1 

“1 

1 

2 

=  1 

1 

<=1 

l 

=  1 

1 

3 

=  1 

1 

-1 

-1 

1 

1 

4 

1 

1 

1 

1 

1 

°  1 

5 

1 

1 

1 

•=1 

*=  1 

=  1 

6 

=  1 

1 

=  1 

1 

1 

1 

7 

1 

1 

1 

°  1 

•=1 

1 
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Pattern 

8 

9 

A 

B 

C 

0 

E 

F 

G 

H 

I 

J 

K 

L 

M 

N 

0 

P 

Q 

R 

S 

T 

U 

V 

w 

X 


Adaline 

One 

Adaline 

Two 

Adaline 

Three 

1 

1 

4 

1 

1 

“1 

“1  , 

1 

1 

1 

«>  J 

1 

"1 

1 

1 

1 

4 

1 

I 

=  1 

1 

1 

=>  1 

1 

=  1 

1 

=  1 

1 

“1 

1 

4 

1 

1 

1 

l 

=>1 

1 

“1 

1 

1 

“1 

1 

1 

1 

1 

1 

•  1 

-1 

1 

1 

1 

1 

-l 

1 

1 

“  1 

1 

i 

1 

-1 

1 

1 

1 

1 

1 

1 

“1 

=  1 

=  1 

1 

=  1 

1 

I 

®  1 

1 

1 

Adaline 

Four 

Adaline 

Five 

Adaline 

Six 

l 

1 

«  1 

1 

1 

=  1 

1 

1 

1 

1 

1 

=>1 

~1 

1 

1 

“1 

1 

1 

1 

-1 

«=>  1 

1 

=  1 

I 

“1 

“  1 

“1 

°  1 

=>1 

1 

1 

1 

-1 

1 

1 

1 

4 

=>  1 

=  1 

°  1 

1 

1 

°1 

=  1 

1 

=  1 

1 

1 

1 

1 

4 

1 

l 

1 

1 

=>  1 

=  1 

1 

=  1 

1 

1 

<=1 

°1 

1 

l 

1 

1 

=>1 

<=1 

-1 

=  1 

=  1 

l 

•  1 
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Pattern  Adaline  Adaline 

One  Two 

Adaline 

Three 

Adaline 

Four 

Adaline 

Five 

Adaline 

Six 

Y  -1 

1 

1 

1 

=4 

“1 

Z  -1 

“  1 

l 

=  1 

=>  1 

4 

+  1 

1 

=  1 

1 

1 

1 

&3  a  ^ 

“  1 

1 

“1 

1 

1 

/  °1 

°  1 

1 

1 

“1 

GO 

(  =1 

-1 

1 

“1 

1 

-l 

)  °l 

(Z> 

1 

1 

=  1 

1 

“1 

-1 

I 

=  1 

o  °1 

-1 

“1 

“1 

=  1 

1 

“  °1 

4 

=  1 

l 

■=1 

1 

a  | 

=  1 

=  1 

1 

l 

1 

The  final  output  coding  scheme  was  obtained  mainly  by  trial  i 

and 

error  methods®  However »  an  effort  was  made 

to  use 

certain  characteristics 

of 

the  individual 

input  patterns 

to  facilitate  the 

desired  classification® 

Specifically,,  at  least  one  Adaline  was  trained  to  produce  the  same  output 

for  each  of  the  following  sets  of 

input  patterns? 

a) 

Patterns  with 

a  long  vertical 

line  in  the  left 

hand  column® 

Examples 

E^LsP 

b) 

Patterns  with 

other  long  vertical  lines 

• 

Examples 

49I9T 

c) 

Patterns  with 

a  long  horizontal  line® 

Examples 

esh9i 

d) 

Patterns  with 

large  circles 

Example® 

0»Q 

e) 

Patterns  with 

small  circles 

Examples 

899SP 

f) 

Patterns  with 

small  horizontal  lines 

Examples 

A8+j,  = 

s) 

Patterns  with 

small  vertical 

lines 

Examples 

5,+ 

h) 

Patterns  with 

left  to  right  slant  lines 

Examples 

Ns  VgX 
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i)  Patterns  with  right  to  left  slant  lines 

j)  Patterns  which  were  smaller  than  the  others 


Examples  K»Z,/ 
Examples  »»° 
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